Logistic Regression

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes), for example

  • 1 or 0
  • 10 or -10

In mathematics sigmoid behaves in similar manner for large number of value. Its equation is


In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.axis('off')
plt.arrow(0, 0, 0.5, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k');



In [ ]:

General approach to logistic regression

  1. Collect: Any method.
  2. Prepare: Numeric values are needed for a distance calculation. A structured data format is best.
  3. Analyze: Any method.
  4. Train: We’ll spend most of the time training, where we try to find optimal coefficients to classify our data.
  5. Test: Classification is quick and easy once the training step is done.
  6. Use: This application needs to get some input data and output structured numeric values. Next, the application applies the simple regression calculation on this input data and determines which class the input data should belong to. The application then takes some action on the calculated class.

Pros / Cons

Pros:

  • Computationally inexpensive
  • easy to implement
  • knowledge representation
  • easy to interpret

Cons:

  • Prone to underfitting,
  • may have low accuracy

Works with:

  • Numeric values,
  • nominal values

In [ ]: